OCR - Optical Character Recognition

نویسندگان

  • S. Kahan
  • C. Y. Suen
  • G. Nagy
  • T. Pavlidis
  • J. Swartz
  • Y. P. Wang
  • R. Plamondon
  • A. A. Verikas
  • S. J. Vilunas
چکیده

Character recognition techniques associate a symbolic identity with the image of character. Character recognition is commonly referred to as optical character recognition (OCR), as it deals with the recognition of optically processed characters. The modern version of OCR appeared in the middle of the 1940's with the development of the digital computers. OCR machines have been commercially available since the middle of the 1950's. Today OCR-systems are available both as hardware devices and software packages, and a few thousand systems are sold every week. In a typical OCR systems input characters are digitized by an optical scanner. Each character is then located and segmented, and the resulting character image is fed intoa preproc-essor for noise reduction and normalization. Certain characteristics are the extracted from the character for classification. The feature extraction is critical and many different techniques exist, each having its strengths and weaknesses. After classification the identified characters are grouped to reconstruct the original symbol strings, and context may then be applied to detect and correct errors. Optical character recognition has many different practical applications. The main areas where OCR has been of importance, are text entry (office automation), data entry (banking environment) and process automation (mail sorting). The present state of the art in OCR has moved from primitive schemes for limited character sets, to the application of more sophisticated techniques for omnifont and handprint recognition. The main problems in OCR usually lie in the segmentation of degraded symbols which are joined or fragmented. Generally, the accuracy of an OCR system is directly dependent upon the quality of the input document. Three figures are used in ratings of OCR systems; correct classification rate, rejection rate and error rate. The performance should be rated from the systems error rate, as these errors go by undetected by the system and must be manually located for correction. In spite of the great number of algorithms that have been developed for character recognition , the problem is not yet solved satisfactory, especially not in the cases when there are no strict limitations on the handwriting or quality of print. Up to now, no recognition algorithm may compete with man in quality. However, as the OCR machine is able to read much faster, it is still attractive. In the future the area of recognition of constrained print is expected to decrease. Emphasis will then be on the recognition of unconstrained writing, like omnifont …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optical Character Recognition Systems

Abstract Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. This chapter presents the basic ideas of OCR needed for a better understanding of the book. The chapter starts with a brief background and history of OCR systems. Then the di...

متن کامل

Optical Character Recognition: an Encompassing Review

Optical character recognition (OCR) is becoming a powerful tool in the field of Character Recognition, now a days. In the existing globalized environment, OCR can play a vital role in different application fields. Basically, OCR technique converts images into editable format. This technique converts images in the form of documents such as we can edit, modify and store data more safely for longt...

متن کامل

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

Number Plate Recognition Using Ocr Technique

Automatic Number Plate Recognition (ANPR) is a special form of Optical Character Recognition (OCR). ANPR is an image processing technology which identifies the vehicle from its number plate automatically by digital pictures. In this paper we have presented an algorithm for vehicle number identification based on Optical Character Recognition (OCR). OCR is used to recognize an optically processed...

متن کامل

Cryptogram Decoding for Optical Character Recognition

Optical character recognition (OCR) systems for machine-printed documents typically require large numbers of font styles and character models to work well. When given a document printed in an unseen font, the performance of those systems degrade even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cryptogram decodin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993